Cluster Services view

Administration System Cluster Services Cluster Services view

The Cluster Services view enables you to monitor the status of servers that comprise your N4 installation cluster across yards (if you have multiple yards).

You can monitor the progress of all services within the network and at the scope you are currently logged in. If you log in at the Operator or Complex level, the Cluster Services view is not available. If you log in at the Facility or Yard level, you see all the services available at the single yard level, as well as the services for multiple components that N4 and other components use. You can create a filtered view of the table to limit the display.

The columns list detailed information about each server. From the Actions menu, you can generate a report that compares the Bridge's in-memory model with the current database contents.

Access to the Cluster Services view is provided through the privilege: ADMIN_SYSTEM_MONITOR.

The N4 interface displays the name of the current N4 instance in the status bar next to the current scope.

Each yard installation has one XPS/Bridge host, a Center node host, a Standby Center node host, N4-cluster node hosts, and a database host. One or more XPS client workstations connect to the XPS server.

The Cluster Services view displays a record for each service and network port within the user's current scope. This includes a record for each N4 node in the cluster, the XPS server, and each XPS client. Some services, such as XPS and the Bridge daemon, present multiple services that are used by N4 or other components, and the list shows these services on separate rows.

The N4 setting, ARGOCORE003 (CLUSTER_SERVICE_REFRESH_FREQUENCY_IN_SECONDS) (on page 1) sets the interval for how frequently the Cluster Services view automatically refreshes. The default is 30 seconds.

When the first N4 Cluster node starts, only that first node appears in the Cluster Services view. Other cluster nodes appear in the Cluster Services view when they are started and when they join the cluster. Once the active N4 Center node is started, it removes all inactive services from the Cluster Services view, and displays only active services. Because the required start-up sequence is to start all of the Cluster nodes first, and then the Center node, after a shutdown, the first Cluster node may appear in the Cluster Services view along with any stale services that display as active but are actually inactive. Inactive services on the Cluster Services view stop appearing only when the active Center node is started.

The monitoring function in the Cluster Services view only considers ACTIVE and INACTIVE services based on the following descriptions:

You can remove nodes from the Cluster Services view if they have a status of INACTIVE or SHUTDOWN. If the heartbeat monitor detects a heartbeat from a node that has been deleted from the view, it appears again with the ACTIVE status. Select the node, and right click. From the menu that appears, Select Delete. N4 does not display the Delete option if the node is ACTIVE.

N4 does not allow you to delete nodes that are part of a Job Group.

For information on starting up and shutting down the N4 system, see startup and shutdown procedures.

XPS acquires all reference data from N4, and writes out a codes.txt file. In production, XPS runs in the 'bridged mode' (controlled with the persister_model setting in the server section of the settings.xml file). For testing purposes only, you run XPS in 'file' mode instead. When in 'file' mode, XPS instead uses codes.txt for the reference data, which it reads on startup.

The clusterServicesDiagnostics MBean replicates in your system monitoring tool, such as JConsole or Zabbix, the same information seen in the Cluster Services view. See more information in Administration Debug Node Info Desk view select node Actions Node Attributes Diagnostics view for [node name] clusterServiceDiagnosticsMBean by clicking each of the MBean's attributes. Details appear in the right pane.

 

Cluster Services view - Actions menu options

The Actions menu option is:

This command is only available when logged in at Yard scope, because it pertains to a specific yard.

 

Columns

The following table lists the various columns in the list view:

Column Name

Details

ID

The service ID, including the scope (Operator/Complex/Facility/Yard) in which the service operates, if relevant. You can only see objects at a scope lower than that which you are logged into.

You can filter the table to select a specific scope of items to view.

Name

The name of the node. Some nodes run multiple services.

  • xps - XPS or its sub-services

  • bridge - Sub-services to the bridge daemon

  • ECN4 - ECN4 or its sub-services

  • ECN4WebService - ECN4Web server

  • [Operator]-Node n - Sub-services to N4 nodes – for example, Bento Server services

  • [IP Address]:[Port] - IP address for active XPS clients

  • [port]@[machine name] - machine name for active node

Type

The type of service provided For example, XPS provides a gate service for N4, and a message service for ECN4. For information about specific services within each type see the List of Known Services table, below.

  • BridgeControl

  • BridgeDaemon

  • BridgeService

  • Center Node

  • Cluster Node

  • ECN4Daemon

  • ECN4WebService

  • Ecn4BentoServerService

  • ExpertDeckerService

  • KafkaServer

  • N4BentoService

  • N4CacheMaster

  • N4GateService

  • N4RestWebService

  • XPS-Client

  • XDService

  • XMLRDTService

  • XPSControl

  • XPSDaemon

  • XPSMessageService

Status

The latest self-reported state of the service. Normally a running service is ACTIVE. When it is starting up it may go through phases such as STARTING, INITIALIZING, or LOADING. When a service shuts down cleanly, it sets the state to SHUTDOWN. If a service shuts down unexpectedly, it does not have an opportunity to set the state SHUTDOWN, in which case it may appear ACTIVE, but closer inspection will show its Heartbeat column is no longer updating. Some nodes (such as XPS client) just remove themselves from the list rather than stay there showing SHUTDOWN; this is the case for clients because they come and go frequently and do not provide services to other nodes.

The self-reported statuses of the different services include:

  • LOADING (Light Green): The service is loading the cache into the dynamic memory.

  • WAITING (Yellow): The service is waiting for the first N4 server to load the cache into the dynamic memory.

  • ACTIVE (Green): The service is active and operating normally.

  • RECOVERING (Gray): The service is recovering after an error such as a server crash.

  • INITIALIZING (Gray): The service is initializing using data from the cache.

  • SHUTDOWN (Red): The service had a clean shutdown.

  • INACTIVE (Orange): This does not necessarily mean the service is dead. It means that a heartbeat has not been received in the past two minutes.This inactivity threshold is hard-coded as two minutes. You cannot configure this value.
    For KafkaServer, the Kafka broker was unreachable in the past 30 seconds.

  • DISCONNECTED (Red): The ClusterNode or the BridgeDaemon type service has a heartbeat, but the heartbeat hasn't reached the Center node for the past two minutes. Check the network connectivity on that host.
    For KafkaServer, the Kafka broker has been inactive for more than 2 minutes and 30 seconds.

Ack Delay

The average time duration for the JMS message consuming processes. This column displays values for the Cluster Node and BridgeDaemon Types, only when you log in to N4 to the center node. This column can help you quickly identify the node that is processing slowly. There are  two values shown:

  • N4: the average duration for consuming XPS updates.

  • A4: the average duration for consuming ECI updates.

On each cluster node, there are multiple consuming processes for JMS messages. For each consuming process, an average of the amount of time it takes for the center node to dispatch a message and to receive acknowledgment (ACK), is calculated for the last five messages within the past 10 minutes. The Ack Delay column shows the worst averages for the N4 and A4 consuming processes.

When the worst average exceeds the threshold, the Ack Delay cell background is red. When you see Ack Delay turn red, you should check the health of that node. 

By default, the threshold is 500 milliseconds. You can change the threshold with these settings in the Settings view (Administration Settings Settings):

  • ARGOCORE006 (CLUSTER_SERVICE_N4_ACK_DELAY_THRESHOLD_IN_MILLI_SECONDS) (on page 1)

  • ARGOCORE007 (CLUSTER_SERVICE_A4_ACK_DELAY_THRESHOLD_IN_MILLI_SECONDS) (on page 1)

IP

The IP address of the service, and if applicable, the port number on which it listens for connections.

Port

The port number on which the service listens for connections. Not all services provide a port for other services to talk to, so they do not display a port number. The various N4 messaging components use this information to locate each other.

Version

The build version of the service, if self-reported by the service. The format of this string varies depending on the technology on which the service is built.

Info

This column is a place where a service can post any information it wishes. Currently the N4CacheMaster nodes report a string containing some basic statistics about how that node views the cache.

  • "o#" is the size of the object's cache. The object counts that are logged during on the Cluster Services view and those that are shown in the Bridge log at start-up can potentially be off from each other by a small amount. This is because N4 reporting depends on an in-memory Bridge cache. There is also a time lag between when the counts are read by the log reader and the heartbeat emitter and when they appear on the Cluster Services view in N4.

  • "u" is the number of cache updates the node has processed.

  • "i" is the number of integration errors the node has posted while processing cache updates.

  • "inq#" shows the input queue size for all inputs to the Bridge Daemon.

  • "xoq#" shows the output queue size for data being sent from Bridge Daemon to XPS. This was formerly labeled, "outq#".

  • "noq#" shows the output queue size for data being sent from Bridge Damon to the N4 cache.

  • "eoq#" shows the output queue size for data being sent from Bridge Daemon to ECN4.

The N4RESTWebService uses this column to display the web service path.

Startup

Date/time when the service was started.

Heartbeat

A timestamp emitted by the service at a regular interval (at least once per minute.) This is the time of the latest heartbeat posted by the service.

In the event of a system crash, you can use this to determine the time the problem started.

Activity

Some services report the date/time that they last saw a journal entry/activity.

Shutdown

The time/date of when the state when into SHUTDOWN.

Some service nodes (such as SPARCSClient) remove their row from the display rather than setting the state to SHUTDOWN and displaying a time stamp.

User

For the service Type SPARCSClient, the User ID of the logged-on user. Currently, the other service types do not have a user associated with them.

Memory Used

The amount of memory, in megabytes, the service or node is using.

The ability to do this is specific to the technology the service uses. The JVM-based services (N4CacheMaster and Bridge services) report the JVM memory used and total size. XPS, when running on Windows, can report the operating system's total memory used and total memory size.

Memory Max

The total memory size, in megabytes, of the service or node.

Monitoring Node

The node that is currently performing the status updates for the Cluster Services view.

 

 

Known Services

List of Known Services

Type

Name

Description

Implications If Not ACTIVE*  **

N4CacheMaster

(administrator-assigned N4 node name)

N4 interface to the cache for the yard; one per yard per N4 Node.

If not present and ACTIVE, that N4 node is not running.

N4BentoService

(administrator-assigned N4 node name)

N4 background job that processes messages sent by XPS or ECN4; one per yard (Starts automatically, or can be manually started in the N4 UI).

If not present and ACTIVE, the Bento messages from XPS or ECN4 fail

BridgeDaemon

bridge

The bridge daemon's primary service and cache interface.

If not present and ACTIVE, bridge daemon is not running.

BridgeService

bridge

The bridge daemon's listener to which XPS connects.

If not present and ACTIVE, XPS cannot complete startup.

BridgeControl

bridge

The bridge daemon's debug command line. Defaults to port 12000.

-- for developer use only

XPSDaemon

xps

The XPS server process.

If not present and ACTIVE, XPS is not running.

XPSControl

xps

The XPS server's debug command line.

-- for developer use only

N4GateService

xps

The XPS server's listener to which N4 connects to perform gate transactions.

If not present and ACTIVE, N4 gate operations fail.

XPSMessageService

xps

The XPS server's listener to which Live View clients connect.

If not present and ACTIVE, Live View clients cannot connect.

ExpertDeckerService

(also known as "ECN4 XD server")

(xps; or a SPARCS client IP address)

The listener that processes decking requests; may run on XPS or on a SPARCS client.

If XD is busy, ECN4 makes the decking request to XPS instead.

If not present and ACTIVE, ECN4 relies on XPS to refine the target position for the containers upon dispatch, which has performance implications; also, ECN4 relies on XPS to offer an optimal position when rehandling containers--also a performance concern.

XDService

(also known as "N4 XD server")

(XPS client IP address)

The listener that processes vessel discharge decking requests from N4; runs on an XPS client.

If a N4 XD server is not available, N4 sends the vessel discharge decking request to XPS instead.

If not present and ACTIVE, N4 relies on XPS to assign positions in the yard for vessel discharges. Large terminals, after consultation with Navis, can use a Scaled N4 XD service to free up XPS processing bandwidth.

SPARCSClient

(XPS client IP address)

An XPS client connected to XPS.

Present when the client is running.

ECN4Daemon

ecn4

The ECN4 daemon's primary service and cache interface.

If not present and ACTIVE, ECN4 is not running.

ECN4WebService

ECN4WebService

The ECN4Web daemon's primary service.

If there are two instances of ECN4Web service running (recommended for sites with more than 50 ECN4Web clients>), then the ID column in the N4 Cluster Services view displays both the ECN4Web instances appended with their respective 'IP address' and 'port ID'.

It is possible to optionally set the scope for this service in the ECN4Web application.properties file. However, if you do not set it there, ECN4 intercepts the network node message and sets its own scope as the default. (Normally, ECN4Web is in the same scope as ECN4.)

If not present and ACTIVE, ECN4Web is not running.

N4RestWebService

(administrator-assigned N4 node name)

When configured with the load balancer, N4 creates a single service. When configured without a load balancer (recommended only for testing purposes), N4 creates a separate N4RestWebService for each network node.

The geodetic service. A non-unique network node initialized based on the network topology of the current N4 installation. Spatial bin information is available using the REST web service. ECN4 uses this web service to get that information.

Unlike other services, the N4RestWebService is a virtual node. For that reason, it requires configuration in the N4 Settings view (on page 1). See four settings with ID ARGORESTWEBSERVICE001 (LOAD_BALANCER_ENABLED) (on page 1) - 004.

If not present and ACTIVE, a geodetic service is not running.

XMLRDTService

ecn4

The ECN4 daemon's listener for XMLRDT messages. If ACTIVE, shows the address/port ECN4 listens on for XMLRDT.

If not present and ACTIVE, ECN4 does not accept XMLRDT messages.

ECN4BentoServerService

ecn4

The ECN4 daemon's listener for messages from XPS and XPS clients. If ACTIVE, shows the address/port ECN4 listens on for bento messages from XPS clients (E.g. dispatch, CHE reset, etc.), and XPS (E.g. load/discharge EC events).

If not present and ACTIVE, ECN4 does not accept bento messages, and these will fail.

KafkaServer

(administrator-assigned name of the Kafka broker)

One of the hosts for Apache Kafka messaging platform.

If not present and ACTIVE, that Kafka broker is inactive or disconnected.

If other KafkaServer hosts are active, N4 is running.

* A service that is in states other than ACTIVE (SHUTDOWN, INITIALIZING, CONNECTED, LOADING, etc.) is as useless as if it was not present.

** Services are free to define their own states. Please contact your Navis representative if you have questions about the meaning of unlisted state labels.